NOTE: This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
In last few practices/sessions, you learned about spatial point patterns. The next few sessions will concentrate on area data.
For this practice you will need the following:
This dataset includes the spatial information for the census tracts in the Hamilton Census Metropolitan Area (as polygons), and a host of demographic variables from the census of Canada, including population and languages.
In this practice, you will learn:
O’Sullivan D and Unwin D (2010) Geographic Information Analysis, 2nd Edition, Chapter 7. John Wiley & Sons: New Jersey.
As usual, it is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in R to clear the workspace is rm (for “remove”), followed by a list of items to be removed. To clear the workspace from all objects, do the following:
rm(list = ls())
Note that ls() lists all objects currently on the worspace.
Load the libraries you will use in this activity:
library(tidyverse)
-- Attaching packages --------------------------------------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.0.0 v purrr 0.2.5
v tibble 1.4.2 v dplyr 0.7.5
v tidyr 0.8.1 v stringr 1.3.1
v readr 1.1.1 v forcats 0.3.0
package 㤼㸱ggplot2㤼㸲 was built under R version 3.4.4package 㤼㸱tidyr㤼㸲 was built under R version 3.4.4package 㤼㸱purrr㤼㸲 was built under R version 3.4.4package 㤼㸱dplyr㤼㸲 was built under R version 3.4.4package 㤼㸱stringr㤼㸲 was built under R version 3.4.4package 㤼㸱forcats㤼㸲 was built under R version 3.4.4-- Conflicts ------------------------------------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(rgdal)
package 㤼㸱rgdal㤼㸲 was built under R version 3.4.4Loading required package: sp
package 㤼㸱sp㤼㸲 was built under R version 3.4.4rgdal: version: 1.3-2, (SVN revision 755)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
Path to GDAL shared files: C:/Users/Antonio/Documents/R/win-library/3.4/rgdal/gdal
GDAL binary built with GEOS: TRUE
Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
Path to PROJ.4 shared files: C:/Users/Antonio/Documents/R/win-library/3.4/rgdal/proj
Linking to sp version: 1.3-1
library(broom)
package 㤼㸱broom㤼㸲 was built under R version 3.4.4
library(plotly)
package 㤼㸱plotly㤼㸲 was built under R version 3.4.4
Attaching package: 㤼㸱plotly㤼㸲
The following object is masked from 㤼㸱package:ggplot2㤼㸲:
last_plot
The following object is masked from 㤼㸱package:stats㤼㸲:
filter
The following object is masked from 㤼㸱package:graphics㤼㸲:
layout
library(cartogram)
package 㤼㸱cartogram㤼㸲 was built under R version 3.4.4
library(gridExtra)
Attaching package: 㤼㸱gridExtra㤼㸲
The following object is masked from 㤼㸱package:dplyr㤼㸲:
combine
Read the data that you will use for this practice. This is an Esri shape file that will be saved as an object of class SpatialPolygonDataFrame. The function used to read Esri shape files is rgdal::readOGR. Setting integer64 to “allow.loss” keeps the data as integers as opposed to changing to factors or strings:
Hamilton_CT <- readOGR(".", layer = "Hamilton CMA CT", integer64 = "allow.loss")
OGR data source with driver: ESRI Shapefile
Source: "C:\Antonio\Courses\GEOG 4GA3 - Applied Spatial Analysis\Spatial-Statistics-Course\10. Area Data I\01. Readings and Practice", layer: "Hamilton CMA CT"
with 188 features
It has 255 fields
Integer64 fields read as signed 32-bit integers: ID POPULATION PRIVATE_DW OCCUPIED_D ALL_AGES AGE_4 AGE_5_TO_9 AGE_10_TO_ AGE_15_TO_ AGE_15 AGE_16 AGE_17 AGE_18 AGE_19 AGE_20_TO_ AGE_25_TO_ AGE_30_TO_ AGE_35_TO_ AGE_40_TO_ AGE_45_TO_ AGE_50_TO_ AGE_55_TO_ AGE_60_TO_ AGE_65_TO_ AGE_70_TO_ AGE_75_TO_ AGE_80_TO_ AGE_85 MEDIAN_AGE MALE_ALL_A MALE_4 MALE_5_TO_ MALE_10_TO MALE_15_TO MALE_15 MALE_16 MALE_17 MALE_18 MALE_19 MALE_20_TO MALE_25_TO MALE_30_TO MALE_35_TO MALE_40_TO MALE_45_TO MALE_50_TO MALE_55_TO MALE_60_TO MALE_65_TO MALE_70_TO MALE_75_TO MALE_80_TO MALE_85 MALE_MEDIA FEMALE_ALL FEMALE_4 FEMALE_5_T FEMALE_10_ FEMALE_15_ FEMALE_15 FEMALE_16 FEMALE_17 FEMALE_18 FEMALE_19 FEMALE_20_ FEMALE_25_ FEMALE_30_ FEMALE_35_ FEMALE_40_ FEMALE_45_ FEMALE_50_ FEMALE_55_ FEMALE_60_ FEMALE_65_ FEMALE_70_ FEMALE_75_ FEMALE_80_ FEMALE_85 FEMALE_MED MARRIED_AG MARRIED_OR MARRIED COMMON_LAW UNMARRIED SINGLE SEPARATED DIVORCED WIDOWED MARRIED_A1 MARRIED_O1 MARRIED_M COMMON_LA1 UNMARRIED_ SINGLE_M SEPARATED_ DIVORCED_M WIDOWED_M MARRIED_A2 MARRIED_O2 MARRIED_F COMMON_LA2 UNMARRIED1 SINGLE_F SEPARATED1 DIVORCED_F WIDOWED_F FAMILIES_I FAMILY_SIZ FAMILY_SI1 FAMILY_SI2 FAMILY_SI3 COUPLE_FAM COUPLE_MAR COUPLE_MA1 COUPLE_MA2 COUPLE_MA3 COUPLE_MA4 COUPLE_MA5 COUPLE_COM COUPLE_CO1 COUPLE_CO2 COUPLE_CO3 COUPLE_CO4 COUPLE_CO5 SINGLE_PAR SINGLE_PA1 SINGLE_PA2 SINGLE_PA3 SINGLE_PA4 SINGLE_PA5 SINGLE_PA6 SINGLE_PA7 SINGLE_PA8 CHILDREN_F CHILDREN_1 CHILDREN_2 CHILDREN_3 CHILDREN_4 CHILDREN_5 POPULATIO1 POPULATIO2 POPULATIO3 POPULATIO4 POPULATIO5 POPULATIO6 POPULATIO7 POPULATIO8 POPULATIO9 POPULATI10 POPULATI11 POPULATI12 PRIVATE_HO PRIVATE_HH PRIVATE_H1 PRIVATE_H2 PRIVATE_H3 PRIVATE_H4 PRIVATE_H5 PRIVATE_H6 PRIVATE_H7 PRIVATE_H8 PRIVATE_H9 PRIVATE_10 PRIVATE_11 PRIVATE_12 PRIVATE_13 PRIVATE_14 PRIVATE_15 OCC_PRIVAT OCC_PRIVA1 OCC_PRIVA2 OCC_PRIVA3 OCC_PRIVA4 OCC_PRIVA5 OCC_PRIVA6 OCC_PRIVA7 OCC_PRIVA8 OCC_PRIVA9 PRIVATE_16 PRIVATE_17 PRIVATE_18 PRIVATE_19 PRIVATE_20 PRIVATE_21 PRIVATE_22 PRIVATE_23 NATIVE_LAN NATIVE_LA1 NATIVE_LA2 NATIVE_LA3 NATIVE_LA4 NATIVE_LA5 NATIVE_LA6 NATIVE_LA7 NATIVE_LA8 NATIVE_LA9 NATIVE_L10 NATIVE_L11 NATIVE_L12 NATIVE_L13 NATIVE_L14 NATIVE_L15 NATIVE_L16 NATIVE_L17 NATIVE_L18 NATIVE_L19 NATIVE_L20 NATIVE_L21 NATIVE_L22 NATIVE_L23 NATIVE_L24 NATIVE_L25 NATIVE_L26 NATIVE_L27 NATIVE_L28 NATIVE_L29 NATIVE_L30 NATIVE_L31 NATIVE_L32 NATIVE_L33 NATIVE_L34 NATIVE_L35 NATIVE_L36 NATIVE_L37 NATIVE_L38 NATIVE_L39 NATIVE_L40 NATIVE_L41 NATIVE_L42 NATIVE_L43 NATIVE_L44 NATIVE_L45 NATIVE_L46 NATIVE_L47 NATIVE_L48 NATIVE_L49 NATIVE_L50 NATIVE_L51 NATIVE_L52 NATIVE_L53 NATIVE_L54 NATIVE_L55 NATIVE_L56 NATIVE_L57 NATIVE_L58 NATIVE_L59
To use the plotting functions of ggplot2, the SpatialPolygonDataFrame needs to be “tidied” by means of the tidy function of the broom package:
Hamilton_CT.t <- tidy(Hamilton_CT, region = "TRACT")
Hamilton_CT.t <- dplyr::rename(Hamilton_CT.t, TRACT = id)
Tidying the spatial dataframe strips it from the non-spatial information, but we can add all the data by means of the left_join function:
Hamilton_CT.t <- left_join(Hamilton_CT.t, Hamilton_CT@data, by = "TRACT")
Column `TRACT` joining character vector and factor, coercing into character vector
Now the tidy dataframe Hamilton_DA.t contains the spatial information and the data.
You can quickly verify the contents of the dataframe by means of summary:
summary(Hamilton_CT.t)
long lat order hole piece group
Min. :-80.25 Min. :43.05 Min. : 1 Mode :logical 1:29212 5370124.00.1: 822
1st Qu.:-79.93 1st Qu.:43.21 1st Qu.: 7304 FALSE:29212 5370121.00.1: 661
Median :-79.86 Median :43.24 Median :14606 5370142.01.1: 642
TRACT ID AREA COLORING CMA PROVINCE
Length:29212 Min. : 919807 Min. : 0.3154 Min. :0.000 537:29212 00:14311
Class :character 1st Qu.: 920233 1st Qu.: 1.2217 1st Qu.:0.000 01: 9506
Mode :character Median : 937830 Median : 2.6824 Median :2.000 02: 4394
NAME ABBREV POPULATION PRIVATE_DW OCCUPIED_D LAND_AREA
0124.00: 822 ON:29212 Min. : 5 Min. : 0 Min. : 0 Min. : 0.32
0121.00: 661 1st Qu.: 2756 1st Qu.:1191 1st Qu.:1170 1st Qu.: 1.22
0142.01: 642 Median : 3901 Median :1526 Median :1436 Median : 2.62
POP_DENSIT ALL_AGES AGE_4 AGE_5_TO_9 AGE_10_TO_ AGE_15_TO_
Min. : 2.591 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.: 254.658 1st Qu.: 2755 1st Qu.: 115.0 1st Qu.: 125.0 1st Qu.:140.0 1st Qu.:170.0
Median : 1511.957 Median : 3905 Median : 175.0 Median : 195.0 Median :220.0 Median :260.0
AGE_15 AGE_16 AGE_17 AGE_18 AGE_19 AGE_20_TO_
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0
1st Qu.: 30.00 1st Qu.: 30.00 1st Qu.: 35.00 1st Qu.: 35.00 1st Qu.: 35.00 1st Qu.:180.0
Median : 45.00 Median : 55.00 Median : 50.00 Median : 55.00 Median : 55.00 Median :245.0
AGE_25_TO_ AGE_30_TO_ AGE_35_TO_ AGE_40_TO_ AGE_45_TO_ AGE_50_TO_
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.0
1st Qu.:135.0 1st Qu.: 125.0 1st Qu.: 145.0 1st Qu.: 180.0 1st Qu.:215 1st Qu.:220.0
Median :205.0 Median : 185.0 Median : 205.0 Median : 245.0 Median :300 Median :310.0
AGE_55_TO_ AGE_60_TO_ AGE_65_TO_ AGE_70_TO_ AGE_75_TO_ AGE_80_TO_
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.:190.0 1st Qu.:160.0 1st Qu.:130.0 1st Qu.: 95.0 1st Qu.: 75.0 1st Qu.: 50.0
Median :300.0 Median :260.0 Median :185.0 Median :140.0 Median :105.0 Median : 85.0
AGE_85 MEDIAN_AGE MALE_ALL_A MALE_4 MALE_5_TO_ MALE_10_TO
Min. : 0.00 Min. : 0.00 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.: 35.00 1st Qu.:38.00 1st Qu.:1345 1st Qu.: 60.0 1st Qu.: 65.0 1st Qu.: 75.0
Median : 75.00 Median :43.00 Median :1925 Median : 90.0 Median :100.0 Median :120.0
MALE_15_TO MALE_15 MALE_16 MALE_17 MALE_18 MALE_19
Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 85.0 1st Qu.:15.00 1st Qu.:15.00 1st Qu.: 15.0 1st Qu.: 15.0 1st Qu.: 20.00
Median :130.0 Median :25.00 Median :25.00 Median : 25.0 Median : 30.0 Median : 30.00
MALE_20_TO MALE_25_TO MALE_30_TO MALE_35_TO MALE_40_TO MALE_45_TO
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.0
1st Qu.: 90.0 1st Qu.: 70.0 1st Qu.: 60.0 1st Qu.: 70 1st Qu.: 90.0 1st Qu.:105.0
Median :125.0 Median :105.0 Median : 85.0 Median :100 Median :115.0 Median :150.0
MALE_50_TO MALE_55_TO MALE_60_TO MALE_65_TO MALE_70_TO MALE_75_TO
Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.:110.0 1st Qu.: 95 1st Qu.: 75.0 1st Qu.: 60.00 1st Qu.: 45.00 1st Qu.: 35.00
Median :145.0 Median :140 Median :125.0 Median : 90.00 Median : 65.00 Median : 50.00
MALE_80_TO MALE_85 MALE_MEDIA FEMALE_ALL FEMALE_4 FEMALE_5_T
Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0 Min. : 0.0 Min. : 0.0
1st Qu.: 20.00 1st Qu.: 15.0 1st Qu.:37.00 1st Qu.:1405 1st Qu.: 50.0 1st Qu.: 60.0
Median : 35.00 Median : 25.0 Median :42.00 Median :1920 Median : 85.0 Median : 90.0
FEMALE_10_ FEMALE_15_ FEMALE_15 FEMALE_16 FEMALE_17 FEMALE_18
Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.00
1st Qu.: 70.0 1st Qu.: 80.0 1st Qu.:15.00 1st Qu.:15.00 1st Qu.: 15.0 1st Qu.: 15.00
Median :105.0 Median :125.0 Median :25.00 Median :25.00 Median : 25.0 Median : 25.00
FEMALE_19 FEMALE_20_ FEMALE_25_ FEMALE_30_ FEMALE_35_ FEMALE_40_
Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.: 15.00 1st Qu.: 80.0 1st Qu.: 65.0 1st Qu.: 65.0 1st Qu.: 75.0 1st Qu.: 90.0
Median : 25.00 Median :120.0 Median :100.0 Median : 95.0 Median :105.0 Median :130.0
FEMALE_45_ FEMALE_50_ FEMALE_55_ FEMALE_60_ FEMALE_65_ FEMALE_70_
Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.:105.0 1st Qu.:110 1st Qu.: 95.0 1st Qu.: 85.0 1st Qu.: 65.0 1st Qu.: 50.00
Median :155.0 Median :160 Median :155.0 Median :135.0 Median : 90.0 Median : 70.00
FEMALE_75_ FEMALE_80_ FEMALE_85 FEMALE_MED MARRIED_AG MARRIED_OR
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0 Min. : 0
1st Qu.: 40.00 1st Qu.: 25.00 1st Qu.: 20.00 1st Qu.:39.00 1st Qu.:2355 1st Qu.:1250
Median : 60.00 Median : 45.00 Median : 45.00 Median :43.00 Median :3230 Median :1995
MARRIED COMMON_LAW UNMARRIED SINGLE SEPARATED DIVORCED
Min. : 0 Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.00 Min. : 0.0
1st Qu.:1065 1st Qu.:145.0 1st Qu.:1035 1st Qu.: 660.0 1st Qu.: 55.00 1st Qu.:110.0
Median :1715 Median :210.0 Median :1335 Median : 840.0 Median : 85.00 Median :175.0
WIDOWED MARRIED_A1 MARRIED_O1 MARRIED_M COMMON_LA1 UNMARRIED_
Min. : 0.0 Min. : 0 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.:120.0 1st Qu.:1145 1st Qu.: 620 1st Qu.: 530.0 1st Qu.: 70.0 1st Qu.: 475.0
Median :180.0 Median :1595 Median :1000 Median : 860.0 Median :105.0 Median : 585.0
SINGLE_M SEPARATED_ DIVORCED_M WIDOWED_M MARRIED_A2 MARRIED_O2
Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0 Min. : 0
1st Qu.: 355.0 1st Qu.: 25.00 1st Qu.: 45.00 1st Qu.: 25.00 1st Qu.:1225 1st Qu.: 625
Median : 450.0 Median : 35.00 Median : 60.00 Median : 40.00 Median :1675 Median :1000
MARRIED_F COMMON_LA2 UNMARRIED1 SINGLE_F SEPARATED1 DIVORCED_F
Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.0
1st Qu.: 535.0 1st Qu.: 70.0 1st Qu.: 525.0 1st Qu.: 285.0 1st Qu.: 25.00 1st Qu.: 60.0
Median : 855.0 Median :105.0 Median : 680.0 Median : 385.0 Median : 55.00 Median :110.0
WIDOWED_F FAMILIES_I FAMILY_SIZ FAMILY_SI1 FAMILY_SI2 FAMILY_SI3 COUPLE_FAM
Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0 Min. : 0 Min. : 0.0 Min. : 0
1st Qu.: 95.0 1st Qu.: 765 1st Qu.: 385.0 1st Qu.:165 1st Qu.:135 1st Qu.: 60.0 1st Qu.: 605
Median :140.0 Median :1105 Median : 515.0 Median :235 Median :225 Median : 90.0 Median : 985
COUPLE_MAR COUPLE_MA1 COUPLE_MA2 COUPLE_MA3 COUPLE_MA4 COUPLE_MA5
Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.: 530.0 1st Qu.: 225.0 1st Qu.: 265 1st Qu.:105.0 1st Qu.:115.0 1st Qu.: 50.0
Median : 840.0 Median : 360.0 Median : 430 Median :170.0 Median :185.0 Median : 80.0
COUPLE_COM COUPLE_CO1 COUPLE_CO2 COUPLE_CO3 COUPLE_CO4 COUPLE_CO5
Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000
1st Qu.: 70.0 1st Qu.: 40.00 1st Qu.: 25.00 1st Qu.:10.00 1st Qu.:10.00 1st Qu.: 5.000
Median :105.0 Median : 60.00 Median : 45.00 Median :20.00 Median :15.00 Median : 5.000
SINGLE_PAR SINGLE_PA1 SINGLE_PA2 SINGLE_PA3 SINGLE_PA4 SINGLE_PA5
Min. : 0.0 Min. : 0 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.:105.0 1st Qu.: 75 1st Qu.: 45.00 1st Qu.: 20.00 1st Qu.: 5.00 1st Qu.:25.00
Median :165.0 Median :135 Median : 75.00 Median : 35.00 Median : 15.00 Median :40.00
SINGLE_PA6 SINGLE_PA7 SINGLE_PA8 CHILDREN_F CHILDREN_1 CHILDREN_2
Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0 Min. : 0 Min. : 0.0
1st Qu.:15.00 1st Qu.: 5.000 1st Qu.: 0.000 1st Qu.: 805 1st Qu.: 140 1st Qu.: 240.0
Median :25.00 Median :10.000 Median : 5.000 Median :1195 Median : 210 Median : 375.0
CHILDREN_3 CHILDREN_4 CHILDREN_5 POPULATIO1 POPULATIO2 POPULATIO3
Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.00
1st Qu.: 95 1st Qu.:180.0 1st Qu.:115.0 1st Qu.: 2755 1st Qu.: 295.0 1st Qu.: 50.00
Median :140 Median :290.0 Median :160.0 Median : 3865 Median : 425.0 Median : 70.00
POPULATIO4 POPULATIO5 POPULATIO6 POPULATIO7 POPULATIO8 POPULATIO9
Min. : 0.00 Min. : 0 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.00
1st Qu.: 50.00 1st Qu.: 170 1st Qu.: 2175 1st Qu.: 390.0 1st Qu.:105.0 1st Qu.:20.00
Median : 70.00 Median : 295 Median : 3340 Median : 545.0 Median :150.0 Median :30.00
POPULATI10 POPULATI11 POPULATI12 PRIVATE_HO PRIVATE_HH PRIVATE_H1
Min. : 0.000 Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0 Min. : 0
1st Qu.: 5.000 1st Qu.: 70.0 1st Qu.: 260.0 1st Qu.:1170 1st Qu.: 750 1st Qu.: 670
Median :10.000 Median :110.0 Median : 385.0 Median :1435 Median :1075 Median :1015
PRIVATE_H2 PRIVATE_H3 PRIVATE_H4 PRIVATE_H5 PRIVATE_H6 PRIVATE_H7
Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00
1st Qu.: 550 1st Qu.: 260.0 1st Qu.: 275.0 1st Qu.: 80.0 1st Qu.: 55.00 1st Qu.: 40.00
Median : 885 Median : 385.0 Median : 435.0 Median :145.0 Median : 90.00 Median : 60.00
PRIVATE_H8 PRIVATE_H9 PRIVATE_10 PRIVATE_11 PRIVATE_12 PRIVATE_13
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0
1st Qu.: 25.00 1st Qu.:10.00 1st Qu.: 15.00 1st Qu.:10.00 1st Qu.: 15.00 1st Qu.: 190.0
Median : 40.00 Median :15.00 Median : 25.00 Median :15.00 Median : 30.00 Median : 330.0
PRIVATE_14 PRIVATE_15 OCC_PRIVAT OCC_PRIVA1 OCC_PRIVA2 OCC_PRIVA3
Min. : 0.0 Min. : 0.00 Min. : 0 Min. : 0 Min. : 0.0 Min. : 0.000
1st Qu.: 170.0 1st Qu.: 20.00 1st Qu.:1170 1st Qu.: 615 1st Qu.: 0.0 1st Qu.: 0.000
Median : 300.0 Median : 30.00 Median :1435 Median : 945 Median : 0.0 Median : 0.000
OCC_PRIVA4 OCC_PRIVA5 OCC_PRIVA6 OCC_PRIVA7 OCC_PRIVA8 OCC_PRIVA9
Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.000
1st Qu.: 100.0 1st Qu.: 5.00 1st Qu.: 0.0 1st Qu.: 10.00 1st Qu.: 5.00 1st Qu.: 0.000
Median : 365.0 Median : 15.00 Median : 85.0 Median : 20.00 Median : 45.00 Median : 0.000
PRIVATE_16 PRIVATE_17 PRIVATE_18 PRIVATE_19 PRIVATE_20 PRIVATE_21
Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
1st Qu.:1170 1st Qu.: 175.0 1st Qu.: 380.0 1st Qu.:170.0 1st Qu.:145.0 1st Qu.: 50.0
Median :1435 Median : 300.0 Median : 470.0 Median :230.0 Median :225.0 Median : 80.0
PRIVATE_22 PRIVATE_23 NATIVE_LAN NATIVE_LA1 NATIVE_LA2 NATIVE_LA3
Min. : 0.00 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0.00
1st Qu.: 25.00 1st Qu.: 2755 1st Qu.: 2755 1st Qu.: 2725 1st Qu.:2120 1st Qu.: 30.00
Median : 40.00 Median : 3865 Median : 3865 Median : 3840 Median :3035 Median : 50.00
NATIVE_LA4 NATIVE_LA5 NATIVE_LA6 NATIVE_LA7 NATIVE_LA8 NATIVE_LA9 NATIVE_L10
Min. : 0.0 Min. :0.0000 Min. :0 Min. :0 Min. :0 Min. :0 Min. :0
1st Qu.: 380.0 1st Qu.:0.0000 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
Median : 645.0 Median :0.0000 Median :0 Median :0 Median :0 Median :0 Median :0
NATIVE_L11 NATIVE_L12 NATIVE_L13 NATIVE_L14 NATIVE_L15 NATIVE_L16 NATIVE_L17
Min. :0 Min. :0.00000 Min. :0 Min. :0 Min. : 0.0 Min. : 0.0000 Min. : 0.000
1st Qu.:0 1st Qu.:0.00000 1st Qu.:0 1st Qu.:0 1st Qu.: 375.0 1st Qu.: 0.0000 1st Qu.: 0.000
Median :0 Median :0.00000 Median :0 Median :0 Median : 635.0 Median : 0.0000 Median : 0.000
NATIVE_L18 NATIVE_L19 NATIVE_L20 NATIVE_L21 NATIVE_L22 NATIVE_L23
Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.00 Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 5.00 1st Qu.: 0.000 1st Qu.: 0.000
Median : 0.000 Median : 0.000 Median : 0.0000 Median : 15.00 Median : 0.000 Median : 0.000
NATIVE_L24 NATIVE_L25 NATIVE_L26 NATIVE_L27 NATIVE_L28 NATIVE_L29
Min. : 0.000 Min. :0.00000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000
1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000
Median : 0.000 Median :0.00000 Median : 0.0000 Median : 0.000 Median : 0.000 Median : 0.0000
NATIVE_L30 NATIVE_L31 NATIVE_L32 NATIVE_L33 NATIVE_L34 NATIVE_L35
Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 5.00 1st Qu.: 0.000 1st Qu.: 5.00 1st Qu.: 0.000 1st Qu.: 0.000
Median : 5.000 Median : 10.00 Median : 0.000 Median : 15.00 Median : 5.000 Median : 0.000
NATIVE_L36 NATIVE_L37 NATIVE_L38 NATIVE_L39 NATIVE_L40 NATIVE_L41
Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. :0.00000 Min. : 0.00
1st Qu.: 10.00 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.:0.00000 1st Qu.: 25.00
Median : 25.00 Median : 0.000 Median : 0.000 Median : 0.0000 Median :0.00000 Median : 40.00
NATIVE_L42 NATIVE_L43 NATIVE_L44 NATIVE_L45 NATIVE_L46 NATIVE_L47
Min. : 0.00 Min. : 0.000 Min. :0.00000 Min. : 0.000 Min. : 0.000 Min. : 0.00
1st Qu.: 5.00 1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:10.00
Median :10.00 Median : 5.000 Median :0.00000 Median : 0.000 Median : 5.000 Median :20.00
NATIVE_L48 NATIVE_L49 NATIVE_L50 NATIVE_L51 NATIVE_L52 NATIVE_L53
Min. : 0.0000 Min. : 0.0000 Min. : 0.0 Min. : 0.000 Min. : 0.000 Min. : 0.00
1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 35.0 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00
Median : 0.0000 Median : 0.0000 Median : 70.0 Median : 0.000 Median : 0.000 Median : 5.00
NATIVE_L54 NATIVE_L55 NATIVE_L56 NATIVE_L57 NATIVE_L58 NATIVE_L59
Min. : 0.0 Min. : 0.000 Min. : 0.00 Min. :0.0000 Min. : 0.000 Min. : 0.000
1st Qu.: 0.0 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.000
Median : 0.0 Median : 0.000 Median : 0.00 Median :0.0000 Median : 5.000 Median : 0.000
[ reached getOption("max.print") -- omitted 4 rows ]
Every phenomena can be measured at a location (ask yourself, what exists outside of space?).
In point pattern analysis, the unit of support is the point, and the source of randomness is the location itself. Many other forms of data are also collected at points. For instance, when the census collects information on population, at its most basic, the information can be georeferenced to an address, that is, a point.
In numerous applications, however, data are not reported at their fundamental unit of support, but rather are aggregated to some other geometry, for instance an area. This is done for several reasons, including the privacy and confidentiality of the data. Instead of reporting individual-level information, the information is reported for zoning systems that often are devised without consideration to any underlying social, natural, or economic processes.
Census data, for instance, is reported at different levels of geography. In Canada, the smallest publicly available geography is called a Dissemination Area or DA. A DA in Canada contains a population between 400 and 700 persons. Thus, instead of reporting that one person (or more) are located at a point (i.e., an address), the census reports the population for the DA. Other data are aggregated in similar ways (income, residential status, etc.)
At the highest level of aggregation, national level statistics are reported, for instance Gross Domestic Product, or GDP. Economic production is not evenly distributed across space; however, the national GDP does not distinguish regional variations in this process.
Ideally, a data analyst would work with data in its most fundamental support. This is not alway possible, and therefore many techniques have been developed to work with data that have been agregated to zones.
When working with areas, it is less practical to identify the area with the coordinates (as we did with points). After all, areas will be composed of lines and reporting all the relevant coordinates is impractical. Sometimes the geometric centroids of the areas are used instead.
More commonly, areas are assigned an index or unique identifier, so that a region will typically consist of a set of \(n\) areas as follows: \[ R = A_1 \cup A_2 \cup A_3 \cup ...\cup A_n. \]
The above is read as “the Region R is the union of Areas 1 to n”.
Regions can have a set of \(k\) attributes or variables associated with them, for instance: \[ \textbf{X}_i=[x_{i1}, x_{i2}, x_{i3},...,x_{ik}] \]
These attributes will typically be counts (e.g., number of people in a DA), or some summary measure of the underlying data (e.g., mean commute time).
Imagine that data on income by household were collected as follows:
df <- data.frame(x = c(0.3, 0.4, 0.5, 0.6, 0.7), y = c(0.1, 0.4, 0.2, 0.5, 0.3), Income = c(30000, 30000, 100000, 100000, 100000))
Households are geocoded as points with coordinates x and y, whereas income is in dollars.
Plot the income as points (hover over the points to see the attributes):
p <- ggplot(data = df, aes(x = x, y = y, color = Income)) +
geom_point(shape = 17, size = 5) +
coord_fixed()
ggplotly(p)
The underlying process is one of income sorting, with lower incomes to the west, and higher incomes to the east. This could be due to a geographical feature of the landscape (for instance, an escarpment), or the distribution of the housing stock (with a neighborhood that has more expensive houses). These are examples of a variable that responds to a common environmental factor. As an alternative, people may display a preference towards being near others that are similar to them (this is called homophily). When this happens, the variable responds to itself in space.
The quality of similarity or disimilarity between neighboring observations of the same variable in space is called spatial autocorrelation. You will learn more about this later on.
Another reason why variables reported for areas could display similarities in space is as an consequence of the zoning system.
Suppose for a moment that the data above can only be reported at the zonal level, perhaps because of privacy and confidentiality concerns. Thanks to the great talent of the designers of the zoning system (or a felicitous coincidence!), the zoning system is such that it is consistent with the underlying process of sorting. The zones, therefore, are as follows:
zones1 <- data.frame(x1=c(0.2, 0.45), x2=c(0.45, 0.80), y1=c(0.0, 0.0), y2=c(0.6, 0.6), Zone_ID = c('1','2'))
If you add these zones to the plot:
p <- ggplot() +
geom_rect(data = zones1, mapping = aes(xmin = x1, xmax = x2, ymin = y1, ymax = y2, fill = Zone_ID), alpha = 0.3) +
geom_point(data = df, aes(x = x, y = y, color = Income), shape = 17, size = 5) +
coord_fixed()
ggplotly(p)
What is the mean income in zone 1? What is the mean income in zone 2? Not only are the summary measures of income highly representative of the observations they describe, the two zones are also highly distinct.
Imagine now that for whatever reason (lack of prior knowledge of the process, convenience for data collection, etc.) the zones instead are as follows:
zones2 <- data.frame(x1=c(0.2, 0.55), x2=c(0.55, 0.80), y1=c(0.0, 0.0), y2=c(0.6, 0.6), Zone_ID = c('1','2'))
If you plot these zones:
p <- ggplot() +
geom_rect(data = zones2, mapping = aes(xmin = x1, xmax = x2, ymin = y1, ymax = y2, fill = Zone_ID), alpha = 0.3) +
geom_point(data = df, aes(x = x, y = y, color = Income), shape = 17, size = 5) +
coord_fixed()
ggplotly(p)
What is now the mean income of zone 1? What is the mean income of zone 2? The observations have not changed, and the generating spatial process remains the same. You will notice, however, that the summary measures for the two zones are more similar in this case than they were when the zones more closely captured the underlying process.
The initial step when working with spatial area data, perhaps, is to visualize the data.
Commonly, area data are visualized by means of choropleth maps. A choropleth map is a map of the polygons that form the areas in the region, each colored in a way to represent the value of an underlying variable.
Lets use ggplot2 to create a choropleth map of population in Hamilton. Notice that the fill color for the polygons is given by cutting the values of POPULATION in five equal segments. In other words, the colors represent zones in the bottom 20% of population, zones in the next 20%, and so on, so that the darkest zones are those with populations so large as to be in the top 20% of the population distribution:
ggplot() + geom_polygon(data = Hamilton_CT.t, aes(x = long, y = lat, group = group, fill = cut_number(POPULATION, 5)),color = NA, size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Population")
Inspecting the map above, would you say that the distribution of population is random, or not random? If not random, what do you think might be an underlying process for the distribution of population.
Often, creating a choropleth map using the absolute value of a variable can be somewhat misleading. As seen in the map above, the zones with the largest population are also usually large zones. Any process that you might think of will be confounded by the size of the zones. For this reason, it is often more informative when creating a choropleth map to use a variable that is a rate, for instance population divided by area to give population density:
pop_den.map <- ggplot() + geom_polygon(data = Hamilton_CT.t, aes(x = long, y = lat, group = group, fill = cut_number(POPULATION/AREA, 5)),color = "white", size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Pop Density")
pop_den.map
It can be seen now that the population density is higher in the more central parts of Hamilton, Burlington, Dundas, etc. Does the map look random? If not, what might be an underlying process that explains the variations in population density in a city like Hamilton?
Other times, it is appropriate to standardize instead of by area, by what might be called the population at risk. For instance, lets say that we wanted to explore the distribution of the population of older adults (say, 65 and older). In this case, normalizing not by area, but by the total population, would remove the “size” effect, giving a proportion:
ggplot() + geom_polygon(data = Hamilton_CT.t, aes(x = long, y = lat, group = group, fill = cut_number((AGE_65_TO_ + AGE_70_TO_ + AGE_75_TO_ + AGE_80_TO_ + AGE_85)/POPULATION, 5)),color = NA, size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Prop Age 65+")
Do you notice a pattern in the distribution of seniors in the Hamilton, CMA?
There are a few things to keep in mind when creating choroplet maps.
First, what classification scheme to use, with how many classes, and what colors?
The examples above were all created using a classification scheme based on the quintiles of the distribution. As noted above, these are obtained by dividing the sample into 5 equal parts to give bottom 20%, etc., of observations. The quintiles are a particular form of a statistical measure known as quantiles, of which the median is value obtained when the sample is divided in two equal sized parts. Other classification schemes may include the mean, standard deviation, and so on.
In terms of how many classes to use, often there is little point in using more than six or seven classes, because the human eye cannot distinguish color differences at a much higher resolution.
The colors are a matter of style, but there are coloring schemes that are colorblind safe (see here).
Secondly, when the zoning system is irregular (as opposed to, say, a raster), large zones can easily become dominant. In effect, much detail in the maps above is lost for small zones, whereas large zones, especially if similarly colored, may mislead the eye as to their relative frequency.
Another mapping technique, the cartogram, is meant to reduce the issues with small-large zones.
A cartogram is a map where the size of the zones is adjusted so that instead of being the land area, it is proportional to some other variable of interest.
Lets illustrate the idea behind the cartogram here.
In the maps above, the zones are faithful to their geographical properties. Unfortunately, this obscured the relevance of small zones. A cartogram can be weighted by another variable, say for instance, the population. In this way, the size of the zones will depend on the total population.
Cartograms are implemented in R in the package cartogram.
CT_pop_cartogram <- cartogram(shp = Hamilton_CT, weight = "POPULATION")
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 1: 5.94063097656517
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 2: 4.22282836609772
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 3: 3.24121918122005
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 4: 2.70487592121738
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 5: 2.44050500280525
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 6: 2.28238579460187
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 7: 2.14591026585647
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 8: 1.95255176465899
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 9: 1.8186943555214
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 10: 1.73775203417466
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 11: 1.64845060853967
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 12: 1.45162213155545
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 13: 1.37178691507706
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 14: 1.32738198642848
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 15: 1.29727221750122
Notice that the value of the function cartogram (i.e., its output) is a SpatialPolygonsDataFrame. This object needs to be tidied if we wish to use ggplot2 to visualize it:
CT_pop_cartogram.t <- tidy(CT_pop_cartogram, region = "TRACT")
CT_pop_cartogram.t <- rename(CT_pop_cartogram.t, TRACT = id)
As before, the data were stripped from the tidied version of the dataframe, so they need to be restored:
CT_pop_cartogram.t <- left_join(CT_pop_cartogram.t, CT_pop_cartogram@data, by = "TRACT")
Column `TRACT` joining character vector and factor, coercing into character vector
Plotting the cartogram:
ggplot() + geom_polygon(data = CT_pop_cartogram.t, aes(x = long, y = lat, group = group, fill = cut_number(POPULATION, 5)), color = "white", size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Population")
Notice how the size of the zones has been adjusted.
The cartogram can be combined with coloring schemes, as in choropleth maps:
CT_popden_cartogram <- cartogram(Hamilton_CT, weight = "POP_DENSIT")
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 1: 29.0772344366743
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 2: 26.933147928614
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 3: 25.1986261892098
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 4: 23.7327939549969
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 5: 22.4463390026744
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 6: 21.2818852763495
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 7: 20.2025834378354
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 8: 19.1839656200552
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 9: 18.2099242308777
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 10: 17.2709049215263
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 11: 16.3609019577218
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 12: 15.4772258019554
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 13: 14.619308950628
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 14: 13.7881685736231
Spatial object is not projected; GEOS expects planar coordinatesSpatial object is not projected; GEOS expects planar coordinatesMean size error for iteration 15: 12.9857607680246
Tidy and restore the data:
CT_popden_cartogram.t <- tidy(CT_popden_cartogram, region = "TRACT")
CT_popden_cartogram.t <- rename(CT_popden_cartogram.t, TRACT = id)
CT_popden_cartogram.t <- left_join(CT_popden_cartogram.t, CT_popden_cartogram@data, by = "TRACT")
Column `TRACT` joining character vector and factor, coercing into character vector
pop_den.cartogram <- ggplot() + geom_polygon(data = CT_popden_cartogram.t, aes(x = long, y = lat, group = group, fill = cut_number(POP_DENSIT, 5)),color = "white", size = 0.1) +
scale_fill_brewer(palette = "YlOrRd") +
coord_fixed() +
theme(legend.position = "bottom") +
labs(fill = "Pop Density")
pop_den.cartogram
By combining a cartogram with choropleth mapping, it becomes easier to appreciate the way high population density is concentrated in the central parts of Hamilton, Burlington, etc.
grid.arrange(pop_den.map, pop_den.cartogram, nrow = 1)
This concludes Practice 10.